Nature Cancer — Latest Matching Preprints

1

Cell-Free DNA Genomic and Fragmentomic Features for Early Outcome Prediction in Large B-Cell Lymphoma.

Wang, S.; Mapar, P.; Moldovan, N.; van der Pol, Y.; Safrastyan, A.; van Werkhoven, E.; Tantyo, N. A.; Snieder, B.; Do Brito Valente, A. F.; de Jong, A. V.; Dinmohamed, A.; Drees, E. E. E.; Roemer, M. G. M.; Ylstra, B.; Klerk, C. P. W.; Strobbe, L.; Sandberg, Y.; Boersma, R. S.; Koene, H.; Pruijt, H.; de Heer, K.; van Rijn, R.; Bilgin, Y. M.; de Jongh, E.; Nijland, M.; van der Poel, M.; Koster, A.; Nieuwenhuizen, L.; Fijnheer, R.; Beeker, A.; Mous, R.; Vergote, V. K. J.; Vermaat, J. S. P.; Pegtel, D. M.; Chamuleau, M. E. D.; Mouliere, F.

2026-05-30 oncology 10.64898/2026.05.29.26353426 medRxiv

Top 0.1%

10.3%

Show abstract

Curative-intent immunochemotherapy fails in ~30% of patients with large B-cell lymphoma (LBCL), yet no validated molecular tool enables early identification of high-risk individuals to guide treatment intensification. Using shallow whole genome sequencing (sWGS) of plasma cell-free DNA from 190 LBCL patients, we developed and validated the ACT score (Aberrations, fragment Composition, Terminal motifs), a composite classifier integrating genomic and fragmentomic features from a single post-cycle-1 sample. ACT-positive patients had worse 2-year outcomes versus ACT-negative patients: time-to-progression 29% vs. 83% (HR 4.4, 95% CI 1.9 - 10.0; P = 1.5 x 10 - 4) and overall survival 47% vs. 93% (HR 8.7, 95% CI 3.0 - 25.4; P = 1.8 x 10-6). ACT score was independently prognostic of the International Prognostic Index, and their combination identified the highest-risk patients. Unlike mutation-based approaches, this assay requires neither tumor tissue, germline control nor a baseline plasma sample. Built on open-source tools and sWGS, the ACT score offers a feasible scalable strategy for early risk stratification in aggressive LBCL.

2

Pre-infusion Exhaled breath volatile organic compounds predict severe CRS and ICANS after CAR T-cell therapy

Berna, A.; Fahrmann, J.; Irajizad, E.; Rudsari, H.; Liu, Y.; Logan, J.; Murtada, K.; Grandy, J.; Edwards, M.; Ayers, A.; Ahmed, S.; Neelapu, S.; Saini, N.; John, A.; John, T.

2026-06-01 oncology 10.64898/2026.05.28.26354352 medRxiv

Top 0.4%

3.1%

Show abstract

Background: Severe cytokine release syndrome (CRS) and immune effector cell-associated neurotoxicity syndrome (ICANS) are major dose-limiting toxicities of chimeric antigen receptor (CAR) T-cell therapy. Existing pre-infusion biomarkers offer modest discrimination, motivating non-invasive alternatives. Methods: We prospectively enrolled 26 patients with relapsed/refractory large B-cell lymphoma receiving axicabtagene ciloleucel. Pre-infusion (day -1) exhaled breath samples were analyzed by gas chromatography-mass spectrometry for 40 volatile organic compounds (VOCs). Candidates with univariate AUC > 0.65 for severe (grade >=2) CRS or ICANS were carried forward to sensitivity-maximization-at-given-specificity with LASSO regularization (SMAGS-LASSO), which selected separate panels for each outcome. Model performance was assessed by leave-one-out cross-validation with permutation p-values and Harrell bootstrap optimism correction. Results: The 4-VOC CRS panel (heptanal, benzaldehyde, 2-butanone, ethylbenzene) achieved LOOCV AUC 82.5% (80% sensitivity at 88% specificity) and the 3-VOC ICANS panel (nonanal, allyl methyl sulfide, levomenthol) achieved AUC 86.3% (67% sensitivity at 86% specificity). By tertile, severe CRS occurred in 8/9 (89%) high-risk versus 2/9 (22%) low-risk patients (Cox HR 6.82, 95% CI 1.41-32.9, p=0.017) and severe ICANS occurred in 8/9 (89%) versus 2/9 (22%) (HR 8.28, 95% CI 1.73-39.6, p=0.008). Each 1-SD score increase corresponded to a 3.80-fold higher hazard of severe CRS (p<0.001) and 4.36-fold higher hazard of severe ICANS (p<0.001). In head-to-head comparison, the 3-VOC ICANS panel outperformed the modified Endothelial Activation and Stress Index (mEASIX) (delta-AUC +0.36, DeLong 1-sided p=0.008). The 4-VOC CRS panel had numerically higher AUC than mEASIX (delta-AUC +0.19, p=0.150). Conclusions: Pre-infusion exhaled breath VOC panels stratify CAR T-cell recipients by severity and timing of severe CRS and ICANS, providing a non-invasive complement to existing serum biomarkers. Multi-institutional validation is warranted.

3

Multimodal axes reveal individualized amyloid-β , tau, and neurodegeneration coupling in aging and Alzheimer s disease

Poulakis, K.; Ioannou, K.; Bezgin, G.; Chiotis, K.; Iturria-Medina, Y.

2026-05-26 neurology 10.64898/2026.05.24.26353955 medRxiv

Top 0.8%

1.7%

Show abstract

Can we decode Alzheimers disease (AD) heterogeneity into a few portable axes that capture how amyloid-{beta}, tau and neurodegeneration (A-T-N) spatially co vary in vivo? To answer this question, we built a pipeline that harmonizes longitudinal amyloid-{beta}/tau PET and T1 MRI (gray matter) from ADNI cohort (12,430 images) with mixed effects modeling and then derived stage specific multimodal axes (mVCs) using linked component analysis, with robustness tested in simulations and external validation in the OASIS cohort (4,958 images). We identified a small set of multimodal axes that (i) recapitulate early tau weighted variation in cognitively unimpaired (CU) individuals, AD like A-T-N coupling in cognitively impaired (CI) individuals and atypical CU and CI participants with posterior (precuneus/occipitoparietal) and fronto insular/frontal weighted patterns, (ii) map onto domain specific cognition, APOE e4, and blood/CSF biomarkers of neurodegeneration, neuroaxonal injury and astrocyte activation, (iii) predict clinical transitions, (iv) generalize in an independent cohort, and (v) demonstrate modelling robustness to missing data, high dimensionality, and cross-cohort variability, enabling direct application of the extracted axes to new datasets for biomarker discovery and stratification. Multimodal axes provide a portable, interpretable layer for quantifying amyloid-{beta}-tau-neurodegeneration coupling at the individual level, complementing current biomarker-based staging frameworks based on A-T-N status and tau PET topography, and can be computed on new datasets to aid clinical assessment and trial enrichment.

4

Antibiotic Timing and Survival After Immune Checkpoint Inhibitor Initiation in Patients With Cancer

Zhang, K.; John, D.; Li, W. T.; Hogarth, M.; McKay, R. R.; Ongkeko, W. M.

2026-05-28 oncology 10.64898/2026.05.27.26354193 medRxiv

Top 1.0%

1.3%

Show abstract

Importance: While gut dysbiosis is known to impair response to immune checkpoint inhibitors (ICIs), the relative clinical impact of antibiotic timing (pre- vs. post-ICI initiation) remains unclear. Objective: To evaluate whether antibiotic timing differentially influences overall survival (OS) in a large, multi-institutional pan-cancer cohort. Design, Setting, and Participants: This retrospective cohort study utilized deidentified electronic health record data from six academic medical centers within the University of California Health system. We included 21,108 adults with any malignancy who received PD-1, PD-L1, or CTLA-4 inhibitors between January 2014 and December 2024. Exposures: Antibiotic exposure windows were categorized as pre-only (-60 to -1 days), post-only (+1 to +60 days), both windows, or none. Main Outcomes and Measures: The primary outcome was overall survival (OS) calculated from the first ICI dose. Multivariable Cox proportional hazards models adjusted for demographics, tumor type, line of therapy, and baseline health indicators (albumin, NLR, and recent hospitalization). Results: Among 21,108 patients, 17.3% had pre-only exposure, 13.3% had post-only exposure, and 60.6% had no exposure. In the multivariable model, post-only exposure (HR, 1.27; 95% CI, 1.20-1.35) and combined pre- and post- exposure (HR, 1.31; 95% CI, 1.23-1.40) were significantly associated with higher mortality. Pre-only exposure was not significantly associated with OS (HR, 1.04; 95% CI, 0.99-1.10). Subgroup analyses by tumor type showed consistent trends across major malignancies, including head and neck (Post HR, 1.46) and renal cell carcinoma (Post HR, 1.26). Conclusions and Relevance: In contrast to some smaller studies, this large-scale analysis indicates that antibiotic exposure after ICI initiation carries a greater risk than exposure prior to treatment. These findings highlight the need for rigorous antibiotic stewardship strategies specifically during the early phases of immunotherapy treatment.

5

Tracking the Dynamic Trajectories: A Global-to-Local Pharmacovigilance Analysis of GLP-1 Receptor Agonists

Lu, S.; Ruan, X.; Wang, L.; Wang, X.; Sameer, M.; Liu, H.

2026-06-01 health informatics 10.64898/2026.05.28.26354401 medRxiv

Top 1%

0.7%

Show abstract

Although GLP1/GIP receptor agonists demonstrate unprecedented weight loss efficacy, their rapid clinical adoption has revealed significant real-world tolerability challenges. To evaluate their dynamic safety profiles, we developed a macro to micro pharmacovigilance framework by combining global FAERS reports with local UT Physician EHR. Macroscopically, we distilled 17 shared adverse events across the drug class from FAERS with disproportionality analysis. Microscopically, local EHR data (289,655 longitudinal treatment sessions across 71,316 patients) revealed 51.6% of GLP1 sessions terminated within 90 days. Furthermore, temporal stratified logistic regression demonstrated that initial exposure (0 to 30 days) correlated strongly with nausea and vomiting, which attenuated in extended sessions, whereas extended exposure (>2 years) uncovered late onset risks, notably incident hepatic steatosis. Ultimately, this time aware framework reveals that GLP1 safety profiles are profoundly duration dependent, providing critical insights into both acute intolerances and long-term medication safety.

6

Phenome-Wide Association Study of Pre-Cancer Diagnosis Electronic Health Records Identifies Risk and Inverse Associations in the All of Us Research Program

Rich, C. C. D.; Bang, E. J.; Bair, A. B.; Richardson, B. E.; Millington, J. L.; Bates, B. A.; Davis, M. F.; Bailey, M. H.

2026-05-28 health informatics 10.64898/2026.05.26.26353823 medRxiv

Top 2%

0.7%

Show abstract

Background: The All of Us Research Program represents a rich resource for cancer epidemiology research, with over 400,000 participants with whole genome sequences linked to electronic health records (EHR). Large cancer datasets often focus exclusively on cases without controls and neglect pre-diagnosis healthcare occurrences. Here, we perform a phenome-wide association study (PheWAS) of EHR data at least 1 year pre-diagnosis between cancer cases and matched controls, revealing co-occurring and mutually exclusive phenotypes. Methods: We identified 55,000+ cancer cases across 21 cancer types in All of Us version 8. To eliminate age-related confounding, we implemented a two-stage matching and censoring strategy: loose matching on demographics to establish index dates and cohort comparability, followed by right-censoring of EHR data (excluding 1 year pre-diagnosis/index), then 1:2 matching to address residual demographic imbalance. We tested associations between 23,193 cancer cases, 46,386 matched controls and approximately 1,600 clinical phenotypes using logistic regression adjusted for sex at birth, self-reported race, age at diagnosis/index date, and two censored EHR metrics: observation window and unique condition count, with Bonferroni correction for multiple testing. Results: Our analysis identified 232 significantly associated phenotypes, confirming established cancer risk factors including elevated prostate specific antigen (OR = 2.92, 95% CI: 2.65-3.23; p-value=1.8x10-101) and multinodular goiter (OR = 1.73, 95% CI: 1.56-1.91; p-value=6.7x10-27). Further investigation into the relationship between several phenotypes with seeming inverse effects is warranted. Conclusions: This PheWAS of EHR data at least 1 year pre-diagnosis leveraged the diversity of All of Us to examine how clinical phenotypes prior to cancer diagnosis vary across cancer types and racial groups. Our findings validate All of Us as a robust platform for cancer epidemiology research, confirming established risk factors at scale across diverse populations. This work provides methodological insights for EHR-based susceptibility analyses and demonstrates the value of agnostic phenome-wide approaches for generating hypotheses in precision medicine.

7

Beyond Identifier Matching: An Empirical Characterization of Failure Modes in Biomedical Knowledge Graph Integration

Hu, S.; Cheng, H.; Gillenwater, L.; Manpearl, K.; Mandava, A.; Wang, Y.; Pividori, M.; Stranger, B.; Krishnan, A.; Greene, C.; Gao, Y.

2026-05-28 health informatics 10.64898/2026.05.26.26354182 medRxiv

Top 2%

0.5%

Show abstract

Objective. Biomedical knowledge graphs (KGs) such as PrimeKG, Hetionet, UMLS, and PharmGKB are increasingly used as the substrate for downstream machine-learning, retrieval-augmented generation, drug-repurposing, and electronic health record (EHR) augmentation pipelines. The dominant assumption in published work is that integrating two or more such KGs is a tractable engineering step solved by identifier (ID) matching. This paper interrogates that assumption empirically. We quantify how much concept overlap survives realistic alignment, and we characterize the new failure modes introduced by the methods that practitioners reach for when ID matching is insufficient. Materials and Methods. We compared four widely used biomedical KGs (PrimeKG, Hetionet v1.0, the full UMLS Metathesaurus, and PharmGKB) across eleven node types using a tiered alignment pipeline: (1) direct ID matching for nodes sharing a primary vocabulary; (2) cross-ontology bridging using standard mappings (e.g., MONDO-DOID, HPO-UMLS, HPO-UMLS-MeSH for side effects, NCBI Gene-HGNC-UMLS, UBERON-FMA/SNOMEDCT_US/NCI/MeSH for anatomy); (3) ClinicalBERT cosine-similarity grouping at threshold >= 0.98 for over-segmented disease nodes, with a deterministic suffix-stripping canonicalizer; (4) exact name matching for ontology-poor types (anatomy, REACTOME pathways); and (5) embedding-based fuzzy matching with UMLS lookup (SapBERT and ClinicalBERT) for free-text microbiome concepts. We applied the pipeline to a 698-concept gut-microbiome benchmark spanning taxa, pathways, and disease labels, validated grouping decisions against the curated SSSOM mappings released by the MONDO project, and audited the ClinicalBERT consolidation against five clinical-genetics case studies drawn from the literature. Results. Per-type pairwise coverage was strikingly asymmetric. Genes/proteins and the three Gene Ontology categories aligned cleanly across PrimeKG and Hetionet (mutual coverage 94-99%), but disease overlap was sparse: only 0.7% of PrimeKG individual disease nodes mapped to Hetionet, rising to 2.0% after MONDO grouping (versus 78.7% and 18.4% from the Hetionet side). PrimeKG-to-UMLS coverage spanned 100% (effect/phenotype via HPO) down to 20.8% (REACTOME pathways), with drugs at 73.7% and anatomy at 58.8%. PrimeKG-to-PharmGKB drug coverage required up to two bridging hops (DrugBank -> UMLS -> RxNorm/ATC/MeSH). Bigger was not uniformly more complete: on a 698-concept microbiome drug benchmark, Hetionet missed 0 concepts while PrimeKG missed 16. ClinicalBERT-based grouping consolidated 22,205 raw MONDO disease nodes into 17,080 groups but introduced three reproducible failure modes documented in case studies: (i) peer over-merging: for example, all 22 osteogenesis imperfecta subtypes collapsed into a single node despite distinct severity classes; (ii) parent-child collapse: e.g. acute myeloid leukemia merged with myeloid leukemia, erasing the acute/chronic distinction that drives clinical management; and (iii) lexical false positives: neurofibromatosis and schwannomatosis grouped together despite cellular-pathology differences. Discussion. Identifier matching alone is a weak baseline for biomedical KG integration. Cross-ontology bridges and embedding-based consolidation expand coverage but do so at the cost of clinically meaningful resolution, and the resulting failures are systematic rather than random. Reporting only aggregate coverage statistics obscures these losses, which propagate silently into downstream tasks. Conclusion. We provide reusable per-type coverage tables, a taxonomy of three integration failure modes, and concrete recommendations for downstream studies that depend on a unified biomedical KG. We argue that future KG integration work should report per-type coverage and per-cluster confidence rather than aggregate match rates.

8

Personalized clinical reference intervals for routine precision medical care

Zhang, C.; Chen, Y.-L.; Jamilov, A.; Liu, E.; Shree, S.; Lam, B. D.; Foy, B. H.

2026-05-30 health informatics 10.64898/2026.05.28.26354363 medRxiv

Top 2%

0.5%

Show abstract

Most routine clinical markers are interpreted using population-based reference intervals, despite being regulated around patient-specific homeostatic setpoints. This mismatch obscures physiologic shifts, inhibiting detection of early disease signatures. Here, we develop a novel Bayesian inference method that adaptively constructs personalized reference intervals using each patients existing health records. In analysis of >100 million lab tests in >800,000 patients, these personalized intervals can be accurately constructed with only minimal prior data, meaning this method can be applied near universally. We show that across 43 common lab markers, patient setpoints are strongly associated with future morbidity, with signal strength increasing as more test data is collected. Deviation from personalized reference intervals provides strong and novel risk signatures across diverse disease states, including hypothyroidism, hematologic cancers, kidney disease, and pregnancy complications. Importantly, personalized reference intervals capture a different risk signature to existing population-based approaches, with the highest risk patients being those who deviate from both intervals simultaneously. In a targeted clinical use case study of iron infusion, use of personalized reference intervals greatly improved prediction of treatment efficacy and allowed precise tracking of treatment responses. Our results illustrate how existing health records can be used to construct personalized benchmarks for nearly all common clinical tests, driving a new paradigm for precision laboratory medicine.

9

Deep Learning Spatial Profiling of CD103+CD8+ T Cells and Survival in Rectal Cancer After Neoadjuvant Chemoradiotherapy

Abe, T.; Yamashita, K.; Nagasaka, T.; Fujita, M.; Ueda, Y.; Miyake, S.; Ito, R.; Adachi, Y.; Ando, M.; Tsuneki, T.; Okazoe, Y.; Konaka, R.; Takahashi, T.; Kagiyama, H.; Tachibana, T.; Imai, M.; Yoshida, T.; Saito, M.; Mukohyama, J.; Kanayama, K.; Koma, Y.-I.; Otowa, Y.; Hasegawa, H.; Ikeda, T.; Koterazawa, Y.; Aoki, T.; Harada, H.; Urakawa, N.; Goto, H.; Kanaji, S.; Yanagimoto, H.; Matsuda, T.; Takamura, S.; Yamashita, T.; Sasaki, R.; Fukumoto, T.; Kakeji, Y.

2026-05-28 oncology 10.64898/2026.05.26.26353629 medRxiv

Top 3%

0.3%

Show abstract

Background: CD8+ tumor-infiltrating lymphocytes (TILs) are established prognostic markers in colorectal cancer, yet the clinical significance of CD103+CD8+ tissue-resident memory-like (TRM-like) T cells in locally advanced rectal cancer (LARC) after neoadjuvant chemoradiotherapy (NACRT) remains unknown. Methods: We quantified CD8+ and CD103+CD8+ T-cell densities in stromal and intratumoral compartments of post-NACRT resection specimens from 40 LARC patients using Cu-Cyto, a deep learning-based imaging cytometry platform. Associations with survival, pathological response, and adjuvant chemotherapy (AC) were examined. Treatment-induced T-cell dynamics were assessed in paired pretreatment biopsies and post-NACRT resections (n = 9). Results: High stromal CD103+CD8+ density independently predicted better 5-year RFS (67.4% vs. 12.1%, p < 0.001) and OS (80.0% vs. 26.6%, p = 0.016); intratumoral density showed no prognostic significance. Pathological response correlated with stromal CD8+ but not CD103+CD8+ density. Paired analysis revealed a selective non-expansion of the CD103+ subset: stromal CD8+ T cells increased significantly after NACRT while CD103+CD8+ density remained unchanged. AC may preferentially benefit patients with low stromal CD103+CD8+ density. Conclusions: Stromal CD103+CD8+ T-cell density is a robust independent prognostic biomarker in rectal cancer after NACRT that appears to reflect pre-existing rather than treatment-induced immunity. Given its stability across NACRT, pretreatment biopsy assessment may provide equivalent prognostic information, with potential implications for patient stratification before treatment initiation.

10

Multivariate determinants of wearable-measured sleep quality across a large observational cohort: roles of physical activity, gut microbiome, blood analytes, and lifestyle factors.

Cavon, J.; Perez, C.; Quinn-Bohmann, N.; Magis, A. T.; Gibbons, S. M.

2026-05-29 health informatics 10.64898/2026.05.27.26354250 medRxiv

Top 3%

0.2%

Show abstract

Emerging evidence links the gut microbiome to sleep quality, yet measuring sleep at scale remains challenging. Commercial wearables, such as Fitbit, capture objective sleep and activity data in naturalistic settings. We integrated Fitbit data from a large, deeply-phenotyped cohort with paired lifestyle and health questionnaires. Wearable-derived measures aligned well with self-reported sleep, activity, and happiness. We identified dozens of covariate-adjusted associations between Fitbit-derived sleep features, lifestyle factors, and multi-omic data. Among molecular feature sets, the gut microbiome showed the greatest number of associations with sleep quality: butyrate-producing genera were positively associated with sleep and amplified the benefits of physical activity. Oscillospira, in particular, was consistently associated with better sleep. In blood, insulin, omega-3, and cortisol correlated with poorer sleep, whereas lower alcohol intake and mineral supplements correlated with better sleep. These robust, covariate-adjusted findings advance mechanistic understanding of the gut-sleep axis and broader molecular and lifestyle determinants of sleep quality.

11

Integrative Genetic Analyses of Lipid Metabolism and Multiple Sclerosis Severity Using Metabolome-Wide and Cis-Mendelian Randomization

Noroozi, R.; Higgins Tejera, C.; Chen, M.; Briggs, F. B. S.; Bhargava, P.; Fitzgerald, K. C.

2026-05-29 neurology 10.64898/2026.05.27.26354239 medRxiv

Top 3%

0.2%

Show abstract

The course of multiple sclerosis (MS) is highly heterogeneous, yet the biological mechanisms underlying this variability remain incompletely understood. Although metabolic alterations have increasingly been associated with disease progression, existing observational evidence is limited by confounding, reverse causation, and an inability to establish causal mechanisms. To bridge this gap, we used a metabolome-wide Mendelian Randomization (MR) framework, including thorough sensitivity analyses, to identify metabolites genetically linked to MS severity that can causally affect it. Bidirectional MR analyses revealed a subset of amino acid and lipid pathways with strong, consistent effects across different MR approaches, confirmed by tests for heterogeneity, horizontal pleiotropy, and LD confounding. For metabolites prioritized by metabolome-wide MR with evidence of causal effects, we conducted genetic colocalization at loci encompassing proximal enzyme-encoding genes, leveraging the corresponding instrumental variants to assess shared underlying genetic signals. This process revealed shared genetic signals between metabolite levels and MS severity, mapped to the FADS1/2 and CYP4F2 loci. A subsequent pathway-resolved set of cis-MR analyses across FADS1/2-derived polyunsaturated fatty acid (PUFA) metabolites, using a functional variant that proxies reduced {triangleup}5-desaturase activity, showed consistent effects indicating that FADS1 perturbation is associated with MS severity. Collectively, these results highlight FADS1 as a key driver of PUFA-related causal effects on MS severity in both systemic (circulating metabolites) and brain cell-specific contexts. Additional supportive cis-MR evidence implicates the disruption of CYP4F2 as another PUFA-metabolizing enzyme.

12

Redefining Extent Of Resection After Meningioma Surgery: a Multicentre Observational Machine Learning Analysis Comparing Simpson, Radiological and Volumetric Grading

Pandit, A. S.; Deehan, M.; Moudgil-Joshi, J.; Reischer, G.; Mathew, S.; Pace, G.; Fatania, G.; Dalton, A.; Nair, R.; Hyare, H.; Mallon, D.; Kitchen, N.; Marcus, H. J.; Nachev, P.

2026-05-27 oncology 10.64898/2026.05.23.26353944 medRxiv

Top 4%

0.1%

Show abstract

Background: Extent of resection remains central to meningioma management, yet Simpson grading is subjective and may not reflect measurable postoperative residual disease. We compared surgeon-reported Simpson grade, report-derived radiological grading, and residual tumour volumetry across a multicentre cohort. Methods: We performed a retrospective study across two tertiary neurosciences centres comprising four hospitals, including patients undergoing primary cranial meningioma resection from 2006 to 2025. Postoperative magnetic resonance imaging (MRI) reports were harmonised using weakly supervised natural language processing based on term frequency-inverse document frequency (TF-IDF) and a linear support vector machine classifier. Residual tumour volume was segmented from contrast-enhanced postoperative MRI and log-transformed. Concordance between Simpson and radiological gross-total/subtotal resection classification was assessed using absolute agreement and prevalence-adjusted bias-adjusted kappa (PABAK). Cox models assessed recurrence-free survival, with bootstrap validation and anatomical and scan-timing sensitivity analyses. Results: Among 912 patients, recurrence or residual progression occurred in 281. Surgical-radiological agreement was substantial but imperfect (absolute agreement 74%; PABAK 0.61), with lower agreement in skull-base and parafalcine-parasagittal tumours. In adjusted models, recurrence hazard increased with Simpson grade (hazard ratio 1.54, 95% confidence interval 1.37-1.72), radiological grade (1.92, 1.68-2.20), and log-transformed residual volume (1.20, 1.16-1.24; all p<0.0005). Optimism corrected concordance increased from Simpson grade to radiological grade and log-volumetry (0.692, 0.733, and 0.748), with this ranking preserved across sensitivity analyses. Conclusions: Imaging-based postoperative residual disease measures outperformed Simpson grade. TF-IDF-assisted report-derived grading provides a scalable bridge to volumetry, while quantitative residual volume offers the strongest prognostic representation.

13

Sensitive Glioma Detection and Recurrence Monitoring Using a Machine Learning Model Based on Circulating Monocytes

Wu, W.; Chai, R.; Xia, P.; Wu, L.; Yu, B.; Chen, X.; Pang, B.; Chen, D.; Wang, Y.; Wang, N.; Li, X.; Liu, H.; Deng, Q.; Wan, F.; Lyu, F.; Wang, L.; Zhang, W.; Zhang, J.; Jiang, T.; Wang, Q.

2026-06-01 oncology 10.64898/2026.05.29.26354409 medRxiv

Top 4%

0.1%

Show abstract

Background: Non-invasive diagnosis, reliable recurrence surveillance remain critical unmet needs in gliomas. Glioma induces profound systemic immune alterations despite its anatomical confinement to the central nervous system. Circulating immune cells, particularly monocytes, are key mediators of tumor-host crosstalk and may retain tumor-induced transcriptional imprints. However, their potential clinical utility as blood-based biomarkers for detection and monitoring, remain largely unexplored. Methods and findings: In this study, we performed integrated single-cell RNA sequencing of blood immune cells and demonstrated that circulating CD14+ monocytes are significantly expanded in glioma patients, exhibiting features of differentiation arrest and increased transcriptional plasticity. These cells harbor glioma-specific molecular signatures distinct from those observed in healthy controls and patients with other tumors. Leveraging these findings, we developed an ensemble machine learning diagnostic model based on transcriptomic profiles of circulating CD14+ monocytes (training cohort, n=107), which achieved a mean area under the receiver operating characteristic curve (AUC) of 0.971 during cross-validation. In an independent cohort of 567 participants, the model maintained high diagnostic accuracy, yielding an AUC of 0.877 for distinguishing glioma from controls and other tumors. And it achieved a recurrence detection AUC of 0.969 in 51 postoperative samples. Moreover, in a prospective follow-up study involving 30 glioma patients, lower model-derived scores of postoperation were significantly associated with prolonged progression-free survival (log-rank test, P=0.043), supporting its prognostic utility. Conclusion: We demonstrate circulating CD14+ monocytes undergo glioma-specific transcriptional reprogramming, generating systemic tumor-associated signal captured via transcriptomic profiling. This blood-based diagnostic model provides non-invasive, scalable approach for glioma detection, recurrence surveillance, outcome prediction.

14

A priority index-based computational medicine framework (PimRNA) for prioritising personalised mRNA cancer vaccines

Fang, H.; Tan, T.

2026-05-29 oncology 10.64898/2026.05.26.26354114 medRxiv

Top 4%

0.1%

Show abstract

Background: The development of personalised mRNA cancer vaccines holds considerable promise for oncology, yet a significant translational gap persists between neoantigen identification and the selection of therapeutically impactful targets. Current approaches predominantly prioritise human leukocyte antigen (HLA) binding affinity and immunogenicity, often overlooking the systems-level biological context of the target. This can inadvertently favour immunogenic but biologically peripheral peptides that exert limited influence on tumour signalling networks, thereby constraining vaccine efficacy. Furthermore, mRNA therapeutics must satisfy additional design requirements, including favourable codon usage and favourable secondary-structure stability, which directly affect in vivo translation and half-life. A unified computational framework that integrates neoantigen discovery with network biology is therefore critically needed. Results: Here, we present PimRNA, a Priority index (Pi)-centric computational medicine framework that bridges this gap by unifying neoantigen identification, mRNA sequence optimisation, and gene interaction network analysis. First, high-confidence tumour-specific HLA class I and II neoantigenic peptides are identified from paired tumour-normal genomic and tumour transcriptomic data using NeoDisc. Second, the coding sequences of these peptides are optimised for stability and translational efficiency with LinearDesign, yielding a core set of neoantigen-encoding mRNAs. Third, a random walk with restart algorithm is applied to a knowledgebase of gene interactions to identify peripheral genes exhibiting significant network connectivity to core genes, generating a gene-predictor matrix in which each gene is assigned an affinity score reflecting its network proximity to immunogenic neoantigens. These scores are consolidated into a single, unified priority rating (0-5) for each gene, followed by subnetwork analysis that reveals therapeutically relevant gene modules. Application of PimRNA to breast cancer and melanoma datasets demonstrates that it successfully selects high-confidence immunogenic neoantigen candidates embedded within biologically meaningful tumour-specific networks. Conclusion: PimRNA provides a systems biology foundation for mRNA vaccine design, moving beyond isolated immunogenicity to prioritise targets that are both highly presented and central to tumour-relevant biological networks. This framework offers a generalisable strategy for the rational discovery and prioritisation of mRNA therapeutics, significantly advancing the field of computational medicine towards personalised cancer vaccines.

15

T cell transcriptional and receptor signatures predict response to telomerase vaccination in prostate cancer

Hoye, E.; Natkin, R.; Sajnani, K.; Engedal, N.; Simensen, J. E.; Hakkola, S.; Kiviaho, A.; Ballesio, F.; Cecchetto, T.; Ellingsen, E. B.; Westhrin, M.; Hovig, E.; Mathelier, A.; Visakorpi, T.; Tammela, T. L.; Murtola, T. J.; Eerola, S.; Nykter, M.; Lilleby, W.; Urbanucci, A.

2026-05-30 oncology 10.64898/2026.05.25.26354038 medRxiv

Top 4%

0.1%

Show abstract

While prostate cancer (PC) is defined as immunologically cold, limiting the efficacy of immune checkpoint inhibitors, therapeutic vaccination targeting tumor-associated antigens represents an attractive strategy to promote disease control in low volume metastatic patients. The UV1 cancer vaccine is based on immunization with tripeptide fragments from human telomerase reverse transcriptase (hTERT) and a phase II clinical trial demonstrated induction of robust T cell response in men with de novo metastatic castration-sensitive prostate cancer (mCSPC). Comparison with long-term survival data of non-metastatic CSPC patients as reference showed that despite metastatic disease at diagnosis, UV1-treated patients who mounted an early vaccine-induced immune response achieved progression-free and overall survival comparable to non-metastatic patients. We examined biological determinants of clinical benefit following UV1 vaccination including tumor transcriptome and T cell receptor (TCR) profiling from circulating and tissue resident T-cells of the 22 men enrolled. Analysis of diagnostic and post-UV1 treatment biopsies revealed that low baseline exhaustion of T cells and higher CD8+ T cell abundance are associated with early immune response to the vaccine and longer survival. Moreover, we identified specific TCR motifs relative to early responders, that can indicate potential benefit from UV1 vaccination. These findings indicate that baseline intratumoral T cell exhaustion state and repertoire shape responsiveness to hTERT vaccination and long-term outcome. Overall, our study underlines how baseline immune profiling may be used as a companion biomarker to predict mCSPC patients most likely to benefit from therapeutic vaccination.

16

Development and Validation of a Machine Learning Model to Predict Prognosis in Patients with Advanced Head and Neck Cancer

Zhang, K.; Gao, L.; John, D.; Li, W. T.; Hogarth, M.; Coffey, C. S.; Ongkeko, W. M.

2026-05-28 oncology 10.64898/2026.05.27.26354194 medRxiv

Top 5%

0.1%

Show abstract

Importance Prognostic tools beyond staging are needed to guide treatment and counseling in head and neck squamous cell carcinoma (HNSCC). Objective To develop and externally validate a machine learning model predicting survival in advanced HNSCC using routinely collected clinical and biomarker data. Design, Setting, and Participants Retrospective, multi-institutional cohort study including 2,385 patients with stage III-IV HNSCC diagnosed from 2012-2022 in the University of California Health Data Warehouse (UCHDW). Patients were randomly split into training (n = 1,908) and test (n = 477) sets. Partial external validation used 7,749 patients from the Surveillance, Epidemiology, and End Results (SEER) registry (2010-2020). Exposures Demographic, tumor, treatment, comorbidity, and biomarker variables recorded at or before diagnosis. Main Outcomes and Measures The primary outcome was all-cause mortality within 70 months. Cox proportional hazards models included all predictors. Discrimination was assessed with Harrell's concordance index (C-index), calibration with predicted vs observed survival, and stratification with Kaplan-Meier curves. A Random Survival Forest (RSF) was trained for benchmarking and interpretability using Shapley Additive exPlanations (SHAP). Results Among 2,385 patients in UCHDW (median age, 63 years; 29.0% mortality), the Cox model achieved a C-index of 0.735 in the internal test set. Risk quartiles showed clear separation on Kaplan-Meier curves (log-rank p < 0.0001). In the SEER cohort (n = 7,749), where only demographic, staging, subsite, and treatment variables were available, the reduced Cox model achieved a C-index of 0.688, with calibration showing modest underestimation of survival in high-risk groups. Age, T stage, Charlson Comorbidity Index, neutrophil-to-lymphocyte ratio, and platelet count were among the strongest predictors, while surgery was associated with improved survival. The RSF achieved a C-index of 0.758 internally, with SHAP highlighting nonlinear effects of albumin, BMI, and inflammatory markers. Conclusions and Relevance A machine learning model using routine clinical and biomarker data demonstrated good prognostic performance in advanced HNSCC, with partial external validation. Such approaches may support individualized survival estimates, risk stratification, and treatment discussions, but broader validation is required before clinical adoption.

17

Advanced Multimodal AI for Predicting Long-Term Functional Outcomes After Ischemic Stroke Using Only Admission Data

McBride, F.; Huang, H.; Kapoor, A. K.; Oermann, E.; Frontera, J. A.; Razavian, N.

2026-05-29 neurology 10.64898/2026.05.27.26354289 medRxiv

Top 5%

0.1%

Show abstract

Background and Purpose Prognostication after acute ischemic stroke often relies on limited variables and simple risk scores, despite richer information being available at admission. We developed a multimodal AI model using admission data to predict modified Rankin Scale (mRS) outcomes and compared it to established tools. Methods In a retrospective study of ischemic stroke/TIA patients, we trained three modality-specific models on admission non-contrast head CT, history and physical notes, and structured clinical variables, and combined them in a weighted-average ensemble. We predicted binary (mRS 0-2 versus 3-6) and ordinal mRS (0-6) outcomes at discharge and 90 days. Performance on an external test cohort was compared with THRIVE and SPAN-100 scores using AUROC, AUPRC, Brier score, mean absolute error (MAE), and quadratic weighted kappa (QWK). Results A total of 6,915 patients were split into training, validation and testing cohorts in a 3:1:1 ratio. For discharge binary mRS (n=1596), the multimodal ensemble achieved significantly better discrimination (AUROC 0.859, AUPRC 0.858) with 25-61% lower Brier scores than THRIVE or SPAN?100 (all p<0.001). For 90?day binary mRS (n=207), the model also outperformed both THRIVE and SPAN-100 (AUROC 0.838, AUPRC 0.805, with 3-38% lower Brier scores). Ordinal mRS prediction showed similarly strong performance with significantly better QWK at discharge and numerically lower MAE. The multimodal ensemble model reassigned about one?third of patients to different risk categories versus THRIVE and was closer to the true discharge outcome in ~74% of discordant cases. Conclusions We developed a well-calibrated multimodal AI model for prediction of discharge and 90-day post-stroke functional outcomes using only data present at the time of admission. This model outperforms existing prognostic tools and can support early clinical decision-making.

18

Pre-pandemic blood profiles predict COVID-19 hospitalization and death a decade later

Jacobs, L. A.

2026-05-29 epidemiology 10.64898/2026.05.27.26354230 medRxiv

Top 5%

0.1%

Show abstract

COVID-19 risk scores developed during the pandemic relied on measurements contemporaneous with infection, leaving unresolved whether the metabolic and inflammatory vulnerability they capture pre-existed as a stable trait or was triggered by acute illness. Here, using 501,946 UK Biobank participants whose blood was drawn between 2006 and 2010---at least ten years before SARS-CoV-2 emerged---we show that baseline proteomic and metabolic profiles predict both COVID-19 hospitalization (2,783 events; C-statistic =0.676 [0.666--0.686]) and COVID-19 mortality (1,564 deaths; C-statistic =0.730 [0.701--0.760]) from parsimonious, regularized feature sets. The IL-1 pathway index (xIL1, +0.093) was independently selected for hospitalization but not mortality, while the IL-6 trans-signaling index (xIL6, + 0.040) was selected for mortality but not hospitalization---a differential pathway weighting corroborated by independent LightGBM/SHAP analysis and mirroring the subsequent success of tocilizumab (anti-IL-6R) and the limited efficacy of anakinra (anti-IL-1R) in reducing COVID-19 mortality in randomized trials conducted years later. The mortality model was additionally characterized by central adiposity (waist-hip ratio, +0.386), a respiratory compromise index (xRSP, +0.149), and prodromal cardiovascular disease (pCVD, +0.246). These findings establish that vulnerability to a novel pathogen is, in substantial part, a pre-existing and measurable prodromal state, with implications for pandemic preparedness and population-level risk stratification.

19

Immune Checkpoint Response Profiles and Resistance Mechanisms in NSCLC Revealed by Circulating Extracellular Vesicle Proteomics

Taylor, C.; Davey, M.; Allain, E. P.; Cheema, A. S.; Crapoulet, N.; Finn, N.; Abd, M.; Ouellette, R.

2026-05-26 oncology 10.64898/2026.05.25.26354042 medRxiv

Top 6%

0.0%

Show abstract

Background: Immune-oncology has revolutionized cancer treatment, but some patients fail to benefit due to primary resistance and tumour-immune evasion. Extracellular vesicles (EVs) are secreted by both tumour and immune cells and mediate communication between cancer cells and the immune system. Our study used proteomic profiling of circulating EVs collected from NSCLC patients treated with immune checkpoint inhibitors (ICI) to identify predictive biomarkers of response as well as immune evasion mechanisms related to treatment resistance. Methods: EVs were isolated from plasma collected prior to ICI treatment using peptide-affinity purification and high-throughput proteomics was performed using Proximal Extension Assay. Differentially expressed EV proteins between durable (DR) and non-durable responders (NDR) were identified and evaluated using Cox proportional hazards regression, survival analysis, sex-stratified analysis, as well as pathway and network analysis. Results: Proteomics analysis identified 116 differentially expressed EV proteins between DR and NDR. NDR was characterized by enrichment of inflammatory, angiogenic, and immune-suppressive EV proteins, such as IL1RL1, TFRC, IL6ST, galectins, TNF superfamily death receptors, chemokines, and PCSK9. Pathway analysis revealed enrichment of angiogenesis, chemotaxis, ECM remodeling, and neutrophil degranulation associated with poor progression-free survival (PFS). In contrast, DR to ICI treatment was associated with EV proteins related to T- and B-cell activation and adaptive immunity. Sex-related differences in abundance and association with PFS was observed for certain EV proteins, including IL1RL1 and TFRC. A six protein EV model (IL1RL1, TFRC, ERI1, CCN5, IGFBPL1, and TNFRSF13C) demonstrated good prognostic performance for identifying NDR (AUC = 0.907) and stratified patients into three discrete risk groups. Conclusions: High-plex EV proteomics revealed biologically coherent tumour-immune signaling programs that are associated with ICI treatment resistance. Profiling circulating EVs may improve our understanding of EV-mediated immune evasion mechanisms and identify protein signatures that reflect the tumour immune microenvironment and predict response to immune checkpoint blockade.

20

The CRAC channel inhibitor Auxora interrupts inflammatory circuits between alveolar macrophages and T cells in patients with viral pneumonia

Casalino-Matsuda, S. M.; Guggilla, V.; Gao, C. A.; Demeulenaere, K. E.; Cusick, L. P.; Fenske, S. W.; Yu, Z.; Lu, Z.; Swaminathan, S.; Grant, R. A.; Schleck, M. J.; Prakriya, M.; Hebbar, S.; Stauderman, K.; Donnelly, H. K.; Pickens, C.; Morales-Nebreda, L.; The NU SCRIPT Study Investigators, ; Wunderink, R. G.; Misharin, A. V.; Singer, B. D.; Budinger, G. S.

2026-05-30 respiratory medicine 10.64898/2026.05.27.26354034 medRxiv

Top 6%

0.0%

Show abstract

Viral pneumonia is perpetuated by inflammatory circuits between activated T cells and monocyte-derived alveolar macrophages (MoAM). T cells and macrophages express ORAI1 and STIM1, which form calcium release-activated calcium (CRAC) channels that allow extracellular calcium entry in response to endoplasmic reticulum calcium store depletion. In a randomized, placebo-controlled, multicenter phase 2 trial (CARDEA), Auxora, a CRAC channel inhibitor, reduced all-cause 30-day mortality by 56% in patients with severe SARS-CoV-2 pneumonia. Here, we report a multi-omics analysis of serially collected alveolar samples from unvaccinated patients with severe SARS-CoV-2 pneumonia treated with Auxora versus placebo. We found reductions in plasma levels of the monocyte- and T cell-chemokines, CCL8 and PDGF-AA. Using peripheral blood mononuclear cells (PBMC) from healthy volunteers, we show that Auxora directly targets T cells to inhibit the transcription of CCL8 and PDGFA in monocyte-derived macrophages, supporting a mechanism for its effects and a potential intermediate biomarker of efficacy.